BOOTSTRAP CONFIDENCE INTERVALS 1 BootES : An R Package for Bootstrap Confidence Intervals on Effect Sizes
نویسندگان
چکیده
Bootstrap Effect Sizes (bootES; Gerlanc & Kirby, 2012) is a free, open source software package for R (R Development Core Team, 2012), which is a language and environment for statistical computing. BootES computes both unstandardized and standardized effect sizes (such as Cohen’s d, Hedges’s g, and Pearson’s r), and makes easily available for the first time the computation of their bootstrap CIs. In this article we illustrate how to use bootES to find effect sizes for contrasts in between-subjects, within-subjects, and mixed factorial designs, and to find bootstrap CIs for correlations, and differences between correlations. An appendix gives a brief introduction to R that will allow readers to use bootES without having prior knowledge of R. BOOTSTRAP CONFIDENCE INTERVALS 3 BootES: An R Package for Bootstrap Confidence Intervals on Effect Sizes Just before the turn of the millennium, the APA Task Force on Statistical Inference recommended that “Interval estimates should be given for any effect sizes involving principal outcomes” (Wilkinson, 1999, p. 599). This recommendation was adopted in subsequent editions of the Publication Manual of the American Psychological Association. For example, the current edition states “complete reporting of ... estimates of appropriate effect sizes and confidence intervals are the minimum expectations for all APA journals” (APA, 2009, p. 33). Reporting confidence intervals (CIs) for effect sizes is the current “best practice” in social science research (Cumming & Fidler, 2009; Kline, 2004; Steiger, 2004; Thompson, 2002). Even so, psychologists and other social scientists have been slow to include CIs in their research reports. Persistence by Geoffrey Loftus raised the percentage of articles that reported CIs in Memory & Cognition from 8% to 45% during his tenure as editor, but this fell to 27% after his term ended (Finch et al., 2004). Finch et al. estimated that only about 22% of published experimental psychological research includes CIs. One hurdle for reporting CIs is that standard statistical software packages provide only rudimentary support for CIs. It is difficult for authors to follow the APA Task Force’s recommendation when the software that they know how to use will not compute CIs for all of their primary effects without the programming of specialized routines. Most standard programs compute CIs only for sample statistics or parameter estimates, such as means, and typically those CIs are computed using traditional methods that are known to have poor coverage (i.e., the proportion of randomly sampled CIs that contain, or cover, the fixed population parameter differs markedly from the purported confidence level). Kelley (2007a, 2007b) made a major contribution to available software with the introduction of MBESS, an R package for generating exact CIs for some effect-size measures (Kelley & Keke, 2012). Exact CIs on effect sizes are generated by finding the most extreme (upper and lower) population effect sizes whose theoretical cumulative probability distributions would yield the observed sample’s effect size with a total probability equal to the desired level of BOOTSTRAP CONFIDENCE INTERVALS 4 confidence. These upper and lower extremes are found using a computationally intensive iterative procedure. The routines in the MBESS package were based on those described by Steiger and Fouladi (1992, 1997) and Steiger (2004). When the underlying data is normally distributed, these CIs are exact, in that they represent the theoretical best estimates. However, as Kelley himself has shown, exact CIs are nonrobust to departures from normality (Algina, Keselman, & Penfield, 2006; Kelley, 2005). When the data are known not to be normally distributed, or when the distribution is unknown (the most common situation for most research), a better approach is to use bootstrap CIs, which we describe in the next section. Our purpose in this article is to illustrate usage of bootES (bootstrap Effect Sizes, Gerlanc & Kirby, 2012), a free software package for R. We developed this package to make it easy for psychologists and other scientists to generate effect size estimates and their bootstrap CIs. These include unstandardized and standardized effect-size estimates and their CIs for mean effects, mean differences, contrasts, correlations, and differences between correlations. Both betweensubject and within-subject effect sizes, and mixtures of the two kinds, can be handled, as described below. R is a free, open source language and environment for statistical computing (R Development Core Team, 2012). It is rapidly increasing in popularity because of its price, power, and the development of specialized packages. R’s reputation for a “steep learning curve” is well deserved, partly because R’s thorough on-line documentation is written primarily by developers for developers. However, R is easy to use if one is shown how. In Appendix 1 and the supporting command files we provide all of the R commands that a user would need to know to compute all of the CIs described in this article, from importing the data to saving the results. Thus, the reader can perform all of the analyses in this article without consulting any other R documentation. Over time, the reader may then begin to use some of R’s additional built-in functionality, such as its publication-quality graphics, or any of the excellent packages that have been developed for psychological research, such as MBESS and the psych package (Revelle, 2012). BOOTSTRAP CONFIDENCE INTERVALS 5 Bootstrap Confidence Intervals Confidence Intervals Point estimates of effect sizes should always be accompanied by a measure of variability, and confidence intervals provide an especially informative measure (APA, 2009, p. 34). In this section we provide a brief introduction to bootstrap CIs. For readers who wish additional background we recommend Robertson (1991), Efron and Tibshirani (1993) and Carpenter and Bithell (2000) as good starting points. A population parameter, θ, such as a population effect size, can usually be obtained from its corresponding sample statistic, !̂ . However, we also would like to know the precision of the estimate. What range of values of θ are plausible, given that we have observed !̂ ? The range of plausible values is provided by a confidence interval (CI) (Efron & Tibshirani, 1993), where “plausibility” is defined by a specified probability (typically, 95%) that intervals of the type computed would cover the population parameter. (The “alpha-level,” α, is equal to 1 minus the desired confidence level expressed as a proportion; so, for a 95% CI, α = .05. It gives the nominal proportion of samples for which CIs would fail to cover the population parameter.) Finding a CI for !̂ requires information about its population distribution, that is, the distribution of !̂ that one would obtain from infinite sampling replications. Unfortunately, this distribution often is unknown. To fill this void, traditional parametric CI methods start with an assumption about the shape of the population distribution. For example, it is common to assume that a population of sample mean differences is distributed as (central) t. However, this will only be true under the null hypothesis and when the population scores are normally distributed. In contrast, exact CI methods assume, more appropriately, that the population of mean differences is distributed as a non-central t, with a population parameter equal to the observed difference. In both methods the assumed theoretical distribution is used as the basis for constructing the CI around !̂ . CI coverage performance depends on the appropriateness of the distributional assumptions, and it is often quite poor (DiCiccio & Efron, 1996). Bootstrap Methods BOOTSTRAP CONFIDENCE INTERVALS 6 Bootstrap methods (Efron, 1979) approximate the unknown distribution of !̂ from the data sample itself. This is done by repeatedly drawing random samples (called resamples), from the original data sample. These resamples contain the same number of data points, N, as the original sample, and because the values are drawn with replacement the same sample value can occur more than once within a resample. The desired statistic !̂ is calculated anew for each resample; denote each resample statistic !̂ * . If the number of bootstrap resamples (R) were, say, 2000, then we would have 2000 !̂ * s. The distribution of these !̂ * s serves as an empirical approximation of the population distribution of !̂ , and we can use it to find a CI for !̂ . Most simply, to find a 95% CI, put the resampled values of !̂ * in rank order and locate the values at the 2.5 and 97.5 centiles in the distribution. This is called a “percentile” bootstrap CI. Such intervals make no assumptions about the shape of the distribution of !̂ . Such simple percentile bootstrap CIs have an advantage in computational transparency, but they suffer from two sources of inaccuracy (DiCiccio & Efron, 1996; Efron, 1987). First, many sample statistics are biased estimators of their corresponding population parameters, such that the expected value of !̂ does not equal θ. Second, the standard error of an estimate of !̂ may not be independent of the value of θ; consequently, even for unbiased estimates the lower and upper percentile cut-offs may not be the same number of standard-error units from !̂ (DiCiccio & Efron, 1996, p. 194). The bias-corrected-and-accelerated (BCa) bootstrap method, which was introduced by Efron (1987), reduces both sources of inaccuracy by adjusting the percentile cut-offs in the distribution of the resampled !̂ * for both bias and for the rate of change, called the acceleration, of !̂ with respect to change in θ. The details of the BCa method are provided in Appendix 2. The BCa method has two important properties that give it an advantage over other methods in many contexts (Efron, 1987). First, the CI coverage errors for the BCa method go to zero with increases in sample size N at a rate of 1/N, whereas the tail coverage errors for traditional and simple percentile bootstrap methods go to zero at the much slower rate of 1/ N . This is one reason why BCa intervals outperform traditional and percentile bootstrap CIs in most BOOTSTRAP CONFIDENCE INTERVALS 7 applications. For both normal and nonnormal population distributions with sample sizes of roughly 20 or more, Monte Carlo research has shown that BCa intervals yield small coverage errors for means, medians, and variances (Efron & Tibshirani, 1993; Lei & Smith, 2003), correlations (DiCiccio & Efron, 1996; Lei & Smith, 2003; Li, Cui, & Chan, 2012; Padilla & Veprinsky, 2012), and Cohen’s d (Algina et al., 2006; Kelley, 2005). The magnitude of the coverage errors, and whether they are liberal or conservative, depends on the particular statistic and the population distribution, and BCa intervals can be outperformed by other methods in particular circumstances (e.g., see Hess, Hogarty, Ferron, & Kromrey, 2007). Nevertheless, their relative performance is typically quite good, and they are recommended for general use in a wide variety of applications (Carpenter & Bithell, 2000; Efron & Tibshirani, 1993, p. 188). A second important property of the BCa method is that, like the percentile method, it is transformation respecting. This means that for a monotonic transformation f of the statistic θ, the endpoints θlow and θhi of the CI on θ can be transformed to find the endpoints f(θlow) and f(θhi) of the CI on f(θ). This is important because the standardized effect sizes that are implemented in bootES that have not been studied directly are monotonic transformations of Cohen’s d (defined below), which has been studied extensively in Monte Carlo simulations. Thus, the coverage performance found in previous Monte Carlo results should hold for our other effect sizes as well. The BCa method was implemented by Canty and Ripley (2012) in the boot.ci function in the R boot() package. BootES uses the boot.ci implementation of BCa as its default method. Other bootstrap methods are available as described in the options section below. Limitations and Sample Sizes Bootstrap CIs do not cure the problems of small sample sizes. Although BCa CIs may outperform traditional CIs for small samples from non-normal populations (see, e.g., Davison & Hinkley, 1997, pp. 230-231), their coverage for small sample sizes can still differ substantially from the nominal 1 – α. This is because bootstrap CIs depend heavily on sample values in the tails of the sample distribution. The several smallest and largest resampled statistics will be computed from those resamples that happen to draw observations primarily from the tails of the BOOTSTRAP CONFIDENCE INTERVALS 8 distribution of sample values. For small samples the tails are sparsely populated, so the percentile cut-offs could be determined by a very small number of sample values. The more extreme the confidence level (i.e., the smaller the α), the more the cut-off will be determined by a small number of extreme sample values. The BCa method can exacerbate the problem because the adjusted cut-offs may move even further into the tails of the distribution. When this is a problem, increasing the sample size and the number of bootstrap resamples can improve coverage. Monte Carlo simulations of BCa bootstrap CIs have been conducted for a wide variety of statistics. Here we briefly summarize four of these, to provide as sense of BCa CI coverage in small samples. Kelley (2005) found excellent coverage in Monte Carlo simulations of BCa intervals across a range of population distributions of Cohen’s d; for sample sizes of n1 = n2 = 15 or more, coverage probabilities for 95% CIs ranged from 94.1% to 96.5%. For example, for the difference between the means of two normal distributions with equal means and variances, the coverage probability was 95.96% (Kelley, 2005, Table 1). Algina et al. (2006) also found very good coverage for Cohen’s d when n1 = n2 = 25, for all but their most nonnormal populations. (Notably, Algina et al. refrained from simulating smaller samples because they “wanted to avoid encouraging the use of small sample sizes.”) Davidson and Hinckley (1997, Table 5.8) computed the ratio of two means, each drawn from different gamma distributions, which had a highly nonnormal population distribution. They observed a respectable 93.2% coverage for 95% BCa intervals when n1 = n2 = 25. Finally, for nominal 99% CIs, Carpenter and Bithell (2000, Table IV) observed a 98.94% coverage rate for BCa intervals on means drawn from an inverse exponential distribution (a moderately nonnormal distribution) when n = 20. However, with a sample size of n = 5 the error rate was more than 1.8% just in the upper tail alone. As a consequence of these and other simulation results, for 95% BCa CIs we recommend sample sizes of approximately 20 or more in each group. To be safe, larger sample sizes, or robust effect size estimates (Algina et al., 2006), should be considered when (a) the distribution of sample values is strongly nonnormal, (b) a smaller α is desired, such as for a 99% CI, and the BOOTSTRAP CONFIDENCE INTERVALS 9 distribution of sample values is noticeably nonnormal, or (c) the effect size is large (and therefore may have arisen from a nonnormal distribution of population effect sizes). Number of Resamples Bootstrap methods are computationally intensive in that they require a large number of resamples and calculations. When these methods were being developed in the 1980s the computing time required to generate several hundred resamples was substantial, and some effort was devoted to determining the smallest number of resamples that could yield adequate CI coverage (e.g., Hall, 1986). For a 90% CI, Efron and Tibshirani (1993, p. 275) recommended that the number of resamples should be “≥ 500 or 1000.” Davison and Hinkley (1997, p. 202) said “if confidence levels 0.95 and 0.99 are to be used, then it is advisable to have [the number of resamples] = 999 or more, if practically feasible.” Today, such numbers take only seconds, and it is practically feasible to use much larger numbers of resamples. In bootES we have implemented a default of 2000 resamples, but users may change this number as described in the options section below. Comparison With Other Software To our knowledge, bootES is the only software that implements bootstrap CIs for all of the effect sizes and data structures described below without requiring users to write specialized code. BootES makes use of the bootstrapping implementations in the boot() package, which was originally written for S-Plus by Canty (see Canty, 2002), and was later ported to R by Ripley (Canty & Ripley, 2012). The boot() package is extremely flexible in that it can perform bootstrap resampling for any user-defined function that is written for a particular data structure. Thus, boot() is not constrained to the statistics or data structures described below. However, beyond the simplest cases, writing appropriate functions for use with boot() is not trivial. The advantage of using bootES is that the effect-size functions are built-in for the types of data structures commonly used by social scientists. BootES contains optional arguments to control those built-in functions, but in typical applications bootES keeps the number of user-specified options to a minimum by making use of the structure of the data to help select the appropriate BOOTSTRAP CONFIDENCE INTERVALS 10 effect size function. Bootstrapping methods are becoming more widely available, and all of the major commercial statistical programs provide their own scripting languages, which allow users to write their own functions and programs. Thus, users who know those scripting languages could write programs to perform any of the computations described below. However, no commercial software offers the breadth of bootES’s built-in effect-size measures and their bootstrap CIs. SYSTAT (version 13), JMP Pro (version 10), and the bootstrapping add-on module for IBM-SPSS (version 20) allow users to compute bootstrap CIs for the summary statistics and parameter estimates computed by many of their procedures. SAS (version 9) provides a customizable bootstrapping macro, but the user needs to know how to program in the SAS macro language. Stata (version 12) has a bootstrap command that allows the user to find bootstrap CIs for userdefined functions of the saved results of existing commands. Resampling Stats (version 4) offers Microsoft Excel and Minitab add-ins that compute bootstrap CIs for user-defined functions. Users already familiar with a commercial program might find that its bootstrapping capability meets their needs. If not, bootES offers “one stop shopping” in a free package. Example Data To illustrate each type of analysis below we will use the example data shown in Table 1. (This is a toy data set for illustration purposes; we do not mean to encourage the use of such small sample sizes.) A text file containing these data is provided with the bootES package, and can be imported using the command shown in Appendix 1. Users without access to the data file may enter the data in a spreadsheet, and save it as a comma-separated-value (csv) file; once the csv file exists it can be imported into R in the normal manner as described in Appendix 1. (Readers who are new to R may wish to read Appendix 1 before continuing to the next section.) In R the imported data will be contained in an object called a data frame, and the data frame must be given a name by the user. In the import command described in Appendix 1 we called our example data frame “myData,” and we used that name throughout our examples. BOOTSTRAP CONFIDENCE INTERVALS 11 Table 1 Example Data, Contained in the Data Frame “myData.” Gender Condition Dosage Meas1 Meas2 Meas3 female A 30 212 399 264 female A 30 290 284 372 female A 30 310 113 169 female A 30 133 353 513 female A 30 387 357 571 female B 60 365 203 388 female B 60 198 262 414 female C 120 192 100 331 female C 120 215 297 382 female C 120 154 339 439 male A 30 148 232 201 male A 30 322 300 128 male A 30 249 463 427 male B 60 135 44 342 male B 60 217 271 376 male B 60 243 252 318 male C 120 424 294 456 male C 120 337 184 296 male C 120 387 407 259 male C 120 481 125 130 To illustrate the use of units of measurement below, let’s pretend that Dosage is measured in milligrams and that the three dependent measures (“Meas1,” “Meas2,” and “Meas3”) are task completion times in seconds. CIs for Means, and General Options BootES provides a variety of options that allow the user to customize the type of CI and output. In this section we illustrate some of these options with CIs on means. In most instances users can simply omit these options and use the default values. As in the R console, all R commands and output are shown below in Monaco font. CIs on Means When a single variable name (a single column of numbers) is given as an argument, bootES will find the mean and 95% bootstrap BCa CI for that variable. Suppose, for example, that we wish to find the CI for the grand mean of measure 1 (“Meas1”) in myData. This column can be picked out using standard R notation, in this instance, “myData$Meas1” (see Appendix 1 BOOTSTRAP CONFIDENCE INTERVALS 12 for an explanation of this notation). To execute a command in R, simply type (or paste and edit) the command after the ‘>’ prompt in the R console, and press return. The mean can be found with the built-in R command: > mean( myData$Meas1 ) [1] 269.95 To find the CI for this mean, one way to write the bootES command is > bootES( data = myData$Meas1 ) We can also omit the “data=” part, so long as the data name appears in the first argument to bootES. We do this in all of the example commands below. Thus, a simpler command is > bootES( myData$Meas1 ) Executing this command results in output like the following. 95.00% bca Confidence Interval, 2000 replicates Stat CI (Low) CI (High) bias SE 269.950 228.716 315.787 0.028 21.918 The mean of measure 1 is 269.95 s, with a 95% CI from 228.72 s to 315.79 s. These values can be copied from R and pasted into, say, the results section of a manuscript. The bias, 0.028 s, is the difference between the mean of the resamples and the mean of the original sample. The “SE” is the standard error, that is, the standard deviation of the resampled means. Above, we said “output like the following” because bootstrap resampling involves taking repeated random samples of one’s data, so these intervals will be slightly different each time. Readers who wish to replicate our numbers in this article exactly may do so by executing the command set.seed(1) immediately before each use of the bootES function. This command resets the seed of R’s random number generator. General Options Number of resamples. As discussed above, by default bootES resamples the data 2000 times. This can be changed using the option “R =,” followed by the desired number of resamples. For example, to find the 95% CI for measure 1 based on 5000 resamples, the command would be BOOTSTRAP CONFIDENCE INTERVALS 13 > bootES( myData$Meas1, R = 5000 ) 95.00% bca Confidence Interval, 5000 replicates Stat CI (Low) CI (High) bias SE 269.950 229.800 317.284 0.522 22.106 The difference between these estimates and those for “R = 2000” were small in this instance. The difference in computing time was also negligible (on a 2.93 GHz computer, 2000 replications took 0.065 s and 5000 replications took 0.147 s). Confidence level. The default confidence level for CIs in bootES is 95%. This level can be changed using the “ci.conf” option. For example, to find the 99% CI for the mean of measure 1, use > bootES( myData$Meas1, ci.conf = 0.99 ) 99.00% bca Confidence Interval, 2000 replicates Stat CI (Low) CI (High) bias SE 269.950 215.225 333.321 0.028 21.918 As expected, this CI is wider than the 95% CI above. CI method. As discussed above, bootES uses by default the BCa method for computing bootstrap CIs, which has been shown to have excellent coverage in a wide range of applications. We recommend that readers use this method unless there is reason a priori to think another method would be superior for a particular data set. In such a case, the other methods offered by the R boot package (Canty & Ripley, 2012) are also available as options: “basic,” percentile (“perc”), normal approximation (“norm”), and studentized (“stud”). (Note that the studentized method requires specification of bootstrap variances as an argument; see the boot.ci help page in R for more information on that option.) These methods are described in Davison and Hinkley (1997, Chapter 5). (The option “none” is also available if one wishes to generate plots of resampled values without computing CIs.) To illustrate, if one wished to use the normal approximation method to find the CI for the mean of measure 1, set the option “ci.type” to “norm”: > bootES(myData$Meas1, ci.type = "norm") 95.00% norm Confidence Interval, 2000 replicates Stat CI (Low) CI (High) bias SE BOOTSTRAP CONFIDENCE INTERVALS 14 269.950 226.963 312.881 0.028 21.918 Plots. Behind the scenes, the boot() function creates an object that contains the resampled statistics along with plotting information. As with boot(), bootES users may save this object under some arbitrary name, for example, myBoots = bootES(...), and then extract the resampled statistics or pass this object by name as an argument to the plot function to generate a histogram and Normal quantile-quantile plot of the resampled statistic. Alternatively, in bootES one can easily generate these plots using the option “plot = TRUE.” > bootES( myData$Meas1, plot = TRUE ) The numerical output will be the same as in our first example above, but the desired plots will be generated in a graphics window. The plots for this example are shown in Figure 1 and indicate that the resampled means are approximately normally distributed. Between-Subjects (Group) Effect Sizes In many instances, effects are defined across group means. Because subject is typically a Figure 1. For 2000 resampled means of example measure 1 (labeled “t*” by default), the left panel shows the histogram, and the right panel shows the Normal quantile-quantile plot. BOOTSTRAP CONFIDENCE INTERVALS 15 random factor, resampling methods must randomly resample subjects within groups, while respecting the number of subjects in each group in the original sample. This properly simulates the uncertainty in the estimate of each group mean, and it is the resulting unweighted resampled means that bootES uses to estimate an effect size between groups. Thus, bootES provides unweighted-means effect sizes, which are appropriate for most scientific questions (Howell & McConaughy, 1982). Between-Subjects: Unstandardized Effect-Size Measures An unstandardized effect size measure is one that expresses the size of an effect in units of measurement, typically the original units of measurement in the research. For example, a study that measures response times in milliseconds might report the size of the observed effects in milliseconds. These effect sizes are useful primarily when the original units are meaningful and readily interpretable by readers. Contrasts. A contrast, also called a comparison of means (Maxwell & Delaney, 2004), is a linearly weighted combination of means, defined for a sample as C = ! " j M j : ! " j = 0 , 1 where the Mj are the sample estimates of the population group means μj. Each mean is multiplied by its corresponding contrast weight, λj, and the sum of the λj must be zero. The sign of C indicates whether the pattern in the group means is in the direction predicted by the contrast weights (+) or in the opposite direction (–). Two groups. For a difference between two means, the λs in Equation 1 are simply +1 and –1. In bootES the user must specify the groups that are involved in the contrast using the “contrast” option, wrapped within R’s c() function (concatenation). Each group label is followed by an equal sign and its contrast weight. For example, to find the difference between the means BOOTSTRAP CONFIDENCE INTERVALS 16 of females and males one could use the option “contrast = c(female = 1, male = -1).” As a convenience, when only two groups are involved in the contrast, the user may omit the contrast weights but add single or double quotes: for example, “contrast = c(‘female’, ‘male’).” We must also tell bootES which column contains the “data” (the dependent, or outcome variable) and which column contains the grouping variable (the independent, or categorical predictor variable). These are specified using the ‘data.col’ and ‘group.col’ options, respectively. For example, suppose we wish to find CIs for the mean difference in measure 1 between females and males in the example data set. In this case the group.col is “Gender” and the data.col is “Meas1.” Pulling these options together, we use the following command: > bootES( myData, data.col = "Meas1", group.col = "Gender", contrast = c("female", "male") ) User-specified lambdas: (female, male) Scaled lambdas: (-1, 1) 95.00% bca Confidence Interval, 2000 replicates Stat CI (Low) CI (High) bias SE 48.700 -37.223 132.422 -0.477 43.223 Note that the command was lengthy, and wrapped onto a second line. That makes no difference in R; all input prior to pressing the return key will be interpreted as part of the same command. The means of the groups can be found easily using a built-in R command, by(), which finds a statistic “by” groups. The first argument is the data column, the second argument is the grouping column, and the third argument is the desired statistic: > by( myData$Meas1, myData$Gender, mean ) myData$Gender: female [1] 245.6 ---------------------------------------myData$Gender: male [1] 294.3 Three or more groups. All else equal, a contrast increases with an increasing correspondence between the pattern in the contrast weights and the pattern in the means. Consequently, a raw contrast C may serve as a measure of effect size. However, the magnitude of C is also affected by the arbitrary scaling of the choice of weights, so the choice of weights can affect the reader’s ability to interpret the contrast as an effect size. Therefore, when BOOTSTRAP CONFIDENCE INTERVALS 17 reporting an unstandardized contrast as an effect-size measure, it is important to choose contrast weights such that they preserve the original units of measurement (or at least some non-arbitrary unit). This can be accomplished by making the positive contrast weights sum to +1 and the negative contrast weights sum to –1. The contrast can then be interpreted as a difference between two weighted means (i.e., weighted by contrast weights), and it will express the effect in the original units of measurement of the dependent variable. To illustrate, suppose we wish to assess the increase in measure 1 in the example data — task completion time measured in seconds — as a linear function of dosage condition. The three conditions, A, B, and C, in the example data correspond to dosages of 30 mg, 60 mg, and 120 mg, respectively. To find linear contrast weights we can simply subtract the mean of these dosages from each dosage. The mean is (30 + 60 + 120)/3 = 70, so the corresponding contrast weights are 30 – 70 = –40, 60 – 70 = –10, and 120 – 70 = 50, respectively. Using the labels for the variable Condition within the data frame, our contrast specification becomes “contrast = c(A = –40, B = –10, C = 50).” These contrast weights capture the same effect as would, say, the weights (–4, –1, 5), but the former will lead to an unstandardized effect size ten times as large as the latter for the same data. To express C in the original units of measure 1, the weights must be scaled to (–0.8, –0.2, 1). This scaling of contrast weights is done by default in bootES, for any weights provided by the user. Both the original and scaled lambda weights are reported in the output: > bootES(myData, data.col = "Meas1", group.col = "Condition", contrast = c(A=-40, B=-10, C=50)) User-specified lambdas: (-40, -10, 50) Scaled lambdas: (-0.8, -0.2, 1) 95.00% bca Confidence Interval, 2000 replicates Stat CI (Low) CI (High) bias SE 61.437 -39.026 157.701 0.188 51.047 The value of C, 61.44 s, is in the original units of measure 1. Weight scaling can be over-ridden by the user with the option ‘scale.weights = FALSE.’ Slopes. A special case of a linear contrast is one that gives the slope of the relationship BOOTSTRAP CONFIDENCE INTERVALS 18 between the outcome variable and the predictor variable. A slope expresses the linear relationship as numbers of unit change in the outcome variable per one unit change in the predictor variable, in the original units of measurement. For example, if one’s hypothesis predicted a linear increase in mean task completion time (in seconds) as drug dosage increased across three drug dosage conditions (e.g., 30 mg, 60 mg, and 120 mg), one might wish to express the effect, with its associated CI, as a slope with units s/mg. This can be accomplished by scaling the contrast weights appropriately. One obtains each scaled weight by demeaning the values, X, on the predictor variable (e.g., dosages), and then dividing each by the sum-of-squares (SS) of those values: ! j = X j " X X j " X ( )2 # , 2 This is done automatically in bootES when the “slope.levels” option is used instead of the “contrast” option. To illustrate, suppose one’s grouping variable is non-numeric (e.g., conditions A, B, and C), but one knows their corresponding numeric values (e.g., 30, 60, and 120 mg). As with contrasts more generally, the mapping of group labels and numeric values (A = 30, B = 60, C = 120) can be given in the ‘slope.levels’ argument, and the appropriate slope weights will be used: > bootES( myData, data.col="Meas1", group.col="Condition", slope.levels = c(A=30, B=60, C=120) ) User-specified lambdas: (-0.00952, -0.00238, 0.0119) Scaled lambdas: (-0.00952, -0.00238, 0.0119) 95.00% bca Confidence Interval, 2000 replicates Stat CI (Low) CI (High) bias SE 0.731 -0.465 1.877 0.002 0.608 The observed slope is 0.73 s/mg, with a 95% CI from –0.47 s/mg to 1.88 s/mg. BOOTSTRAP CONFIDENCE INTERVALS 19 Alternatively, when the grouping variable designated by ‘group.col’ is numeric, those numeric values will be used as levels of the predictor variable. For example, the 30, 60, and 120 mg dosage levels are contained in the “Dosage” column of myData, which can be specified in the slope.levels option: > bootES( myData, data.col="Meas1", slope.levels = "Dosage" ) The output would be identical to that in the previous example. Between-Subjects: Standardized Effect-Size Measures It is often preferable to report standardized, or “scale-free” measures of effect size. Such unitless measures can be meaningful to readers even when they have little intuitive understanding of the effects in the original units. Standardized measures also allow easier assessment and combination in meta-analyses (Glass, 1976; Hedges & Olkin, 1985). We have implemented several standardized effect-size measures in bootES. Cohen’s d-type effect-size measures. One approach to standardizing an effect size is to express the effect in standard deviation units. Typically this means dividing the unstandardized effect size by an estimate of the population standard deviation. Such effect sizes, including the four in this section, are variations on what is typically called “Cohen’s d,” after Cohen (1969). Cohen’s δ and d. Cohen (1969) was concerned primarily with using guesses about population effect sizes to estimate the power of significance tests, and with estimating sample size requirements for achieving desired levels of power. Consequently, he defined his effect-size measures in terms of population parameters. For the difference between two means, the effect size is found by
منابع مشابه
BootES: an R package for bootstrap confidence intervals on effect sizes.
Bootstrap Effect Sizes (bootES; Gerlanc & Kirby, 2012) is a free, open-source software package for R (R Development Core Team, 2012), which is a language and environment for statistical computing. BootES computes both unstandardized and standardized effect sizes (such as Cohen's d, Hedges's g, and Pearson's r) and makes easily available for the first time the computation of their bootstrap conf...
متن کاملBootstrap confidence intervals of CNpk for type‑II generalized log‑logistic distribution
This paper deals with construction of confidence intervals for process capability index using bootstrap method (proposed by Chen and Pearn in Qual Reliab Eng Int 13(6):355–360, 1997) by applying simulation technique. It is assumed that the quality characteristic follows type-II generalized log-logistic distribution introduced by Rosaiah et al. in Int J Agric Stat Sci 4(2):283–292, (2008). Discu...
متن کاملStatistical Topology Using the Nonparametric Density Estimation and Bootstrap Algorithm
This paper presents approximate confidence intervals for each function of parameters in a Banach space based on a bootstrap algorithm. We apply kernel density approach to estimate the persistence landscape. In addition, we evaluate the quality distribution function estimator of random variables using integrated mean square error (IMSE). The results of simulation studies show a significant impro...
متن کاملVariance Estimation of Indicators on Social Exclusion and Poverty using the R Package laeken
This vignette illustrates the application of variance estimation procedures to indicators on social exclusion and poverty using the R package laeken. To be more precise, it describes a general framework for estimating variance and confidence intervals of indicators under complex sampling designs. Currently, the package is focused on bootstrap approaches. While the naive bootstrap does not modif...
متن کاملBootstrap Confidence Intervals for Ordinary Least Squares Factor Loadings and Correlations in Exploratory Factor Analysis.
This article is concerned with using the bootstrap to assign confidence intervals for rotated factor loadings and factor correlations in ordinary least squares exploratory factor analysis. Coverage performances of SE-based intervals, percentile intervals, bias-corrected percentile intervals, bias-corrected accelerated percentile intervals, and hybrid intervals are explored using simulation stud...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013